Annealing Techniques For Unsupervised Statistical Language Learning

نویسندگان

  • Noah A. Smith
  • Jason Eisner
چکیده

Exploiting unannotated natural language data is hard largely because unsupervised parameter estimation is hard. We describe deterministic annealing (Rose et al., 1990) as an appealing alternative to the ExpectationMaximization algorithm (Dempster et al., 1977). Seeking to avoid search error, DA begins by globally maximizing an easy concave function and maintains a local maximum as it gradually morphs the function into the desired non-concave likelihood function. Applying DA to parsing and tagging models is shown to be straightforward; significant improvements over EM are shown on a part-of-speech tagging task. We describe a variant, skewed DA, which can incorporate a good initializer when it is available, and show significant improvements over EM on a grammar induction task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Unsupervised Learning of Mixtures of Markov Sources Thesis submitted for the degree \Master of Science"

Unsupervised classi cation, or clustering, is one of the basic problems in data analysis. While the problem of unsupervised classi cation of independent random variables has been deeply investigated, the problem of unsupervised classi cation of dependent random variables, and in particular the problem of segmentation of mixtures of Markov sources, has been hardly addressed. At the same time sup...

متن کامل

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into gram...

متن کامل

Multiscale Annealing for Real-Time Unsupervised Texture Segmentation

We derive real{time global optimization methods for several clustering optimization problems commonly used in unsupervised texture segmentation. Speed is achieved by exploiting the image neighborhood relation of features to design a multiscale optimization technique, while accuracy and global optimization properties are gained using annealing techniques. Coarse grained cost functions are derive...

متن کامل

Learning Constructions of Natural Language: Statistical Models and Evaluations

Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Sami Virpioja Name of the doctoral dissertation Learning Constructions of Natural Language: Statistical Models and Evaluations Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 158/2012 Field of research Computer and Information Sci...

متن کامل

Robust Unsupervised Clustering Using Generalized Annealing M-estimator

A new robust clustering algorithm, called generalized annealing M-estimator (GAM-estimator), is proposed. Initialized with multiple seeds, the GAM-estimator converges to several optimal cluster centers. Neither knowledge about the number of clusters nor scale is needed. The global optimal solution of clustering is achieved by minimization of an objective function. The algorithm is applied to un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004